Transient and asymptotic dynamics of reinforcement learning in games
نویسندگان
چکیده
Reinforcement learners tend to repeat actions that led to satisfactory outcomes in the past, and avoid choices that resulted in unsatisfactory experiences. This behavior is one of the most widespread adaptation mechanisms in nature. In this paper we fully characterize the dynamics of one of the best known stochastic models of reinforcement learning [Bush, R., Mosteller, F., 1955. Stochastic Models of Learning. Wiley & Sons, New York] for 2-player 2-strategy games. We also provide some extensions for more general games and for a wider class of learning algorithms. Specifically, it is shown that the transient dynamics of Bush and Mosteller’s model can be substantially different from its asymptotic behavior. It is also demonstrated that in general—and in sharp contrast to other reinforcement learning models in the literature—the asymptotic dynamics of Bush and Mosteller’s model cannot be approximated using the continuous time limit version of its expected motion. © 2007 Elsevier Inc. All rights reserved. JEL classification: C73
منابع مشابه
A Behavioral Learning Process in Games
The paper studies the cumulative proportional reinforcement (CPR) rule, according to which an agent plays, at each period, an action with a probability proportional to the cumulative utility that the agent has obtained with that action. the asymptotic properties of this learning process are examined for a decision-maker under risk, where it converges almost surely toward the expected utility ma...
متن کاملStochastic Stability of Perturbed Learning Automata in Positive-Utility Games
This paper considers a class of discrete-time reinforcement-learning dynamics and provides a stochasticstability analysis in repeatedly played positive-utility (strategicform) games. For this class of dynamics, convergence to pure Nash equilibria has been demonstrated only for the fine class of potential games. Prior work primarily provides convergence properties through stochastic approximatio...
متن کاملConvergent Multiple-times-scales Reinforcement Learning Algorithms in Normal Form Games
We consider reinforcement learning algorithms in normal form games. Using two-time-scales stochastic approximation, we introduce a modelfree algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surel...
متن کاملConvergent Multiple-timescales Reinforcement Learning Algorithms in Normal Form Games
We consider reinforcement learning algorithms in normal form games. Using two-timescales stochastic approximation, we introduce a modelfree algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely...
متن کاملReinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic
In this paper, a model-free reinforcement learning-based controller is designed to extract a treatment protocol because the design of a model-based controller is complex due to the highly nonlinear dynamics of cancer. The Q-learning algorithm is used to develop an optimal controller for cancer chemotherapy drug dosing. In the Q-learning algorithm, each entry of the Q-table is updated using data...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Games and Economic Behavior
دوره 61 شماره
صفحات -
تاریخ انتشار 2007